Cost-Based Query Optimiztion for Complex Pattern Mining on Multiple Databases
نویسندگان
چکیده
Mining frequent patterns across multiple datasets has received a lot of research interest recently. In this paper, we investigate cost-based query optimization approaches to efficiently evaluate such mining tasks. Specifically, we make the following contributions: 1) We present a rich class of queries on mining frequent itemsets across multiple datasets supported by a SQL-based mechanism. 2) We present an approach to enumerate all possible query plans for the mining queries, and develop a dynamic programming approach and a branch-and-bound approach based on the enumeration algorithm to find optimal query plans with the least mining cost. 3) We introduce models to estimate the cost of individual mining operators. 4) We evaluate our query optimization techniques on both real and synthetic datasets and show significant performance improvements.
منابع مشابه
SQL based frequent pattern mining
Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. Frequent pattern mining is a foundation of s...
متن کاملEfficiently Supporting Multiple Similarity Queries for Mining in Metric Databases
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest neighbor queries are the most important queries. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically ...
متن کاملFuzzy multi-criteria selection procedures in choosing data source
Technology assessment and selection has a substantial impact on organizations procedures in regards to technology transfer. Technological decisions are usually made by a group of experts, and whereby integrity of these viewpoints to a single decision can be quite complex. Today, operational databases and data warehouses exist to manage and organize data with specific features and henceforth, th...
متن کاملEfficient Frequent Pattern Mining in Relational Databases
Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for t...
متن کاملMultiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typica...
متن کامل